Introduction

The goal of this tutorial is to show an example where dynamic logic model is used to predict drug response.

The tutorial is based on the paper

Eduati et al (2017) Drug resistance mechanisms in colorectal cancer dissected with cell type-specific dynamic logic models. Cancer Research. DOI: 10.1158/0008-5472.CAN-17-0078

On Github: https://github.com/saezlab/CRC-pathway-biomarkers

We investigate here the drug-response of colorectal cancer cell lines. For this, we use drug response data and a signaling dataset.

knitr::include_graphics("./data/logicODE_tutorial/Eduatietal_Figure1.png")

The Genomics of Drug Sensitivity in Cancer (GDSC), https://www.cancerrxgene.org/ offers drug response data for more than a 1000 human cancer cell lines, for houndreds of drugs. A small part of these data is can be found in "./data/IC50_GDSC.csv".

The perturbation dataset contains the short time signaling response of 14 colorectal cancer cell lines, where 14 phosphoproteins are measured under 43 perturbation conditions (5 stimuli, 7 inhibitors).

First, we construct signaling models based on the perturbation data to the cell lines, here we use the CNORode modelling package. In the next step, we will associate model features to drug response to see why certain cell lines respond to certain drugs and others do not. Here we use a linear modeling framework.

CNORode

CNORode is a member of the CellNOptR logic based tool family. It can translate the network to ordinary differential equation (ODE) model, fit the model parameters to data and make predictions.

Dependencies

These should be already installed from previous tutorial.

# installs devtools package if not already installed
#if(!require("devtools")) install.packages('devtools')

# installs CellNOptR and CNORode from GitHub: 
if(!require("CellNOptR")) remotes::install_github('saezlab/CellNOptR')
if(!require("CNORode")) remotes::install_github('saezlab/CNORode')

if(!require("MEIGOR")){
  if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
  BiocManager::install("MEIGOR")
}

if(!require("dplyr")) install.packages('dplyr')
if(!require("readr")) install.packages('readr')
if(!require("tidyr")) install.packages('tidyr')
if(!require("ggrepel")) install.packages('ggrepel')

If you don’t have devtools and cannot install it, then

  1. please visit the https://github.com/saezlab/CellNOptR and https://github.com/saezlab/CNORode websites,
  2. download the toolboxes by clicking “Clone or download” then “Download Zip”
  3. Unzip the files
  4. In RStudio run:
    install.packages("../CellNOptR-master", repos = NULL, type = "source")
    install.packages("../CNORode-master", repos = NULL, type = "source")

Make sure to import the libraries

library(CellNOptR)
library(CNORode)
library(MEIGOR)

library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)
library(ggrepel)

PART I: DRUG response exploration

IC50 <- readr::read_csv("./data/logicODE_tutorial/IC50_GDSC.csv") %>% rename("cell_line" = "X1")
## Warning: Missing column names filled in: 'X1' [1]
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   X1 = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
print(IC50)
## # A tibble: 14 x 31
##    cell_line  Gefitinib  RDEA119 `CI-1040` Afatinib `Nutlin-3a` `PD-0325901`
##    <chr>          <dbl>    <dbl>     <dbl>    <dbl>       <dbl>        <dbl>
##  1 CAR1           1.36   1.34        3.27   -0.831        5.51        -0.235
##  2 CCK81         -0.659 NA           2.74   -1.61         0.551       -1.72 
##  3 COLO320HSR     1.46   3.78        4.15    1.22         4.26         0.658
##  4 DIFI          -3.26   0.00908     1.61   -5.31         4.44        -1.97 
##  5 HCT116         1.03  -0.510       1.46   -0.0704       1.14        -1.56 
##  6 HT115          0.444  0.946       2.96   -0.640        4.55        -1.41 
##  7 HT29           1.91  -2.24        0.230   0.849        4.68        -3.74 
##  8 SKCO1         NA     -1.49       NA      NA            1.97        -4.58 
##  9 SNUC2B         0.375  1.55        1.94    0.113        3.45        -1.44 
## 10 SNUC5          2.13   0.246       1.79    0.739        5.01        -3.26 
## 11 SW1116        NA     NA          NA      NA           NA           NA    
## 12 SW1463        NA     NA          NA      NA           NA           NA    
## 13 SW620          1.51  -0.802      -0.104   1.36         4.28        -3.73 
## 14 SW837         NA     NA          NA      NA           NA           NA    
## # … with 24 more variables: SB590885 <dbl>, AZD6244 <dbl>, BMS-536924 <dbl>,
## #   Cetuximab <dbl>, HG-5-88-01 <dbl>, (5Z)-7-Oxozeaenol <dbl>,
## #   Trametinib <dbl>, Dabrafenib <dbl>, Afatinib (rescreen) <dbl>,
## #   AZD6244.1 <dbl>, RDEA119 (rescreen) <dbl>, BMS-754807 <dbl>, OSI-906 <dbl>,
## #   BMS-345541 <dbl>, Ruxolitinib <dbl>, LY317615 <dbl>, XMD14-99 <dbl>,
## #   CP724714 <dbl>, CH5424802 <dbl>, EKB-569 <dbl>, OSI-930 <dbl>,
## #   QL-XI-92 <dbl>, EX-527 <dbl>, THZ-2-102-1 <dbl>
IC50 %>% gather(drug_name, log_IC50, -cell_line) %>%
  ggplot() +
  geom_point(aes(drug_name, log_IC50,col=cell_line)) + 
  coord_flip() + 
  theme_bw() + 
  ggtitle("Drug response of 14 CRC cell lines")
## Warning: Removed 49 rows containing missing values (geom_point).

Form the raw IC50 values we can see that there are some drugs that are more effective (Trametinib) than others, like XMD14-99. There are also cell-line differences, for example, DIFI shows stronger sensitivity to Afatinib than any other cell lines. What could be the reason for this?

PART II: cell-line models

The goal of part II is to build a cell-line specific model from the perturbation data using CNORode.

This model is an ordinary differential equation (ODE) model, where the equation for each state (\(x_A\)) can be written as \[\frac{dx_A}{dt} = \tau_A(B(f_1(x),f_2(x),...)-x_A) \] here

The main differences are that in ODE models the states are continuous values, therefore it is quantitative, not only qualitative like a Boolean model. Further, here we hace to find the specific edge and node parameters.

What do we need to build and simulate a differential equation model?

In this example, the baseline is set to 0.5. A value of 1 means full activation and 0 means full inhibition of the node.

Perturbation data

The following heatmap shows an overview on the perturbation data. The first block outlines the combinations of treatment. Then each other block represents the response of a cell line. Different columns within a block shows the different phosphoprotein markers.

In the tutorial we make a single model for the first cell line COLO320HSR.

Model a single cell line

Similarly to the previous tutorial with CellNopt, here we also start by importing a prior knowledge network and the perturbation data in MIDAS format.

# load Prior Knowledge Network (PKN)
pknmodel <- readSIF("./data/logicODE_tutorial/PKN.sif")

# load normalised perturbation data 
# select MIDAS file for the desired cell line
MIDASfile <- "./data/logicODE_tutorial/processed/MIDAS/MD-COLO320HSR_Ktuned_v1_n4_all_noEGFRi_CNORode.csv"

Mydata <- readMIDAS(MIDASfile=MIDASfile,verbose = FALSE)
cnolist <- makeCNOlist(Mydata, subfield=F)
## [1] "Please be aware that if you only have some conditions at time zero (e.g.only inhibitor/no inhibitor), the measurements for these conditions will be copied across matching conditions at t=0"
cnolist$valueStimuli[ cnolist$valueStimuli==0 ]=0.5

Show the network first

plotModel(pknmodel,cnolist)

As in CellNOPt:

  • green nodes are stimulated in some experiments
  • blue nodes are measured
  • white nodes are modelled, but not measured
  • red nodes or red bordered nodes are occasionally inhibited
  • black edges represents activation, red T-shaped arrows represents inhibition

Then the data in CellNOpt format:

plotCNOlist(cnolist)

The data is very large, therefore is hard to see the details, but we can notice as some nodes increases their activity at the final time (30 mins).

# compress the network (no expansion, only OR gates are considered)
model <- preprocessing(data=cnolist, model=pknmodel, compression=TRUE, expansion=FALSE)
## [1] "The following species are measured: AKT, EGFR, ERK, GSK3, IkBa, JNK, MEK, PI3K, RPS6, RSKp90, SMAD2, cJun, mTOR, p38, TAK1, TGFRb, IRS1, M3K1, MP2K4, S6K, TBK1, IKK, RASK, BRAF, PAK1"
## [1] "The following species are stimulated: PLX, EGF, HGF, IGF1, TGFb, TNFa"
## [1] "The following species are inhibited: IKK, MEK, PI3K, BRAF, TAK1, TBK1, mTOR"
## [1] "The following species are not observable and/or not controllable: "
# set initial parameters (here parameters 'k' and 'tau' are optimised and 'n' fixed to 3)
ode_parameters <- createLBodeContPars(model, 
                                      LB_n = 1, LB_k = 0, LB_tau = 0,
                                      UB_n = 3, UB_k = 1, UB_tau = 1,
                                      default_n = 3,
                                      default_k = 0.5,
                                      default_tau = 0.01,
                                      opt_n = FALSE, opt_k = TRUE, opt_tau = TRUE,
                                      random = TRUE)

# PLX -> BRAF is an artificial regulation used to model paradoxical effect of PLX4720,
# which works as selective BRAF inhibitor in cell-lines where BRAF is mutated in
# V600E (i.e. HT29 and SNUC5 in our panel), but induces a paradoxical activation
# of wild type BRAF cells (modeled as stimulus on those cell lines)
ode_parameters$parValues[which(ode_parameters$parNames=="PLX_k_BRAF")]<-0.5
ode_parameters$index_opt_pars<- setdiff(ode_parameters$index_opt_pars,
                                        which(ode_parameters$parNames=="PLX_k_BRAF"))

## Parameter Optimization
# essm
paramsSSm=defaultParametersSSm()
paramsSSm$local_solver = "DHC"
paramsSSm$maxtime = 30; #36000;
paramsSSm$transfer_function = 4;

The actual optimisation takes around 10 mins, instead of the 30 sec.  So, instead of running it here, we just load the results:

opt_pars = parEstimationLBode(cnolist, model, method="essm",
                              ode_parameters=ode_parameters,
                              paramsSSm=paramsSSm)
#write_rds(opt_pars,"data/logicODE_tutorial/opt_pars_30sec.RDS")
opt_pars <- read_rds("data/logicODE_tutorial/opt_pars_30sec.RDS")

Plot the fit of the model:

sim_res <- CNORode::plotLBodeFitness(cnolist,model,ode_parameters = opt_pars)

pdf("./CNORode_fit_tmp.pdf",width = 30,height = 20)
sim_res <- CNORode::plotLBodeFitness(cnolist,model,ode_parameters = opt_pars)
dev.off()
## quartz_off_screen 
##                 2

The fit is a bit better than a random model, but these optimisations should be run for around 10 hours.

We are interested in the optimised model parameters of this model.

opt_par_values <- opt_pars$parValues
names(opt_par_values) <- opt_pars$parNames

opt_par_values[opt_pars$index_opt_pars]
##  TGFRb_k_TAK1   TNFa_k_TAK1      tau_TAK1  TGFb_k_TGFRb  EGFR_k_TGFRb 
##  6.290662e-01  0.000000e+00  7.787088e-03  2.802733e-01  1.542701e-01 
##     tau_TGFRb    EGF_k_EGFR    ERK_k_EGFR      tau_EGFR    S6K_k_IRS1 
##  0.000000e+00  0.000000e+00  2.007754e-07  7.015971e-03  0.000000e+00 
##   TBK1_k_IRS1   mTOR_k_IRS1   IGF1_k_IRS1    MEK_k_IRS1      tau_IRS1 
##  0.000000e+00  0.000000e+00  0.000000e+00  4.060043e-07  0.000000e+00 
##    PI3K_k_AKT       tau_AKT   PI3K_k_M3K1   RASK_k_M3K1      tau_M3K1 
##  4.100319e-02  0.000000e+00  4.432509e-01  5.075054e-08  1.568557e-09 
##    ERK_k_RPS6    S6K_k_RPS6 RSKp90_k_RPS6      tau_RPS6     IKK_k_ERK 
##  2.498247e-01  0.000000e+00  8.162135e-01  1.027507e-02  3.065270e-01 
##     MEK_k_ERK       tau_ERK  TAK1_k_MP2K4  M3K1_k_MP2K4  TNFa_k_MP2K4 
##  1.271568e-01  9.911941e-03  1.000000e+00  0.000000e+00  0.000000e+00 
##     tau_MP2K4    mTOR_k_S6K    PI3K_k_S6K       tau_S6K    IKK_k_TBK1 
##  0.000000e+00  5.503381e-01  0.000000e+00  1.462364e-05  6.452658e-01 
##      tau_TBK1    AKT_k_mTOR      tau_mTOR    TAK1_k_IKK     AKT_k_IKK 
##  9.931725e-03  0.000000e+00  0.000000e+00  8.246211e-01  0.000000e+00 
##    M3K1_k_IKK    TNFa_k_IKK    TBK1_k_IKK       tau_IKK   EGFR_k_PI3K 
##  0.000000e+00  0.000000e+00  8.707054e-05  0.000000e+00  1.599331e-01 
##   IRS1_k_PI3K   RPS6_k_PI3K   TNFa_k_PI3K   IGF1_k_PI3K    HGF_k_PI3K 
##  0.000000e+00  1.000000e+00  0.000000e+00  0.000000e+00  4.751120e-01 
##   RASK_k_PI3K      tau_PI3K   EGFR_k_RASK   IGF1_k_RASK    HGF_k_RASK 
##  3.009077e-02  3.042961e-02  0.000000e+00  0.000000e+00  5.156353e-07 
##      tau_RASK    BRAF_k_MEK    PAK1_k_MEK       tau_MEK    ERK_k_BRAF 
##  0.000000e+00  7.505019e-03  0.000000e+00  0.000000e+00  5.244096e-02 
##   RASK_k_BRAF   PAK1_k_BRAF    PLX_k_BRAF      tau_BRAF  ERK_k_RSKp90 
##  0.000000e+00  9.269049e-02  3.603077e-06  0.000000e+00  9.381897e-01 
##    tau_RSKp90    TAK1_k_JNK    IRS1_k_JNK    M3K1_k_JNK    TNFa_k_JNK 
##  1.471951e-02  0.000000e+00  3.617738e-05  0.000000e+00  8.352255e-02 
##   MP2K4_k_JNK       tau_JNK    AKT_k_PAK1   PI3K_k_PAK1      tau_PAK1 
##  0.000000e+00  0.000000e+00  5.496838e-01  0.000000e+00  0.000000e+00 
##    TAK1_k_p38   MP2K4_k_p38    TBK1_k_p38       tau_p38 TGFRb_k_SMAD2 
##  2.582269e-01  0.000000e+00  8.704423e-01  1.299676e-02  0.000000e+00 
##     tau_SMAD2    AKT_k_GSK3 RSKp90_k_GSK3      tau_GSK3   TAK1_k_IkBa 
##  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  4.080694e-01 
##    IKK_k_IkBa      tau_IkBa    JNK_k_cJun      tau_cJun 
##  7.350341e-01  1.539245e-02  0.000000e+00  0.000000e+00

Similar to the above cell-line, we can build a model for each of the cell lines. This is very time consuming, therefore we just load the optimised parameters from the paper.

optimised_parameters <- read_delim("./data/logicODE_tutorial/allModelsParameters.txt",delim = "\t")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   cell_line = col_character()
## )
## ℹ Use `spec()` for the full column specifications.

Let’s check edge and node parameters.

optimised_parameters %>% select(cell_line, contains("_k_")) %>%
  column_to_rownames("cell_line") %>% as.matrix() %>% heatmap()

optimised_parameters %>% select(cell_line, contains("tau_")) %>%
  column_to_rownames("cell_line") %>% as.matrix() %>% heatmap()

The level of edge and node parameters differs across the cell-lines.

PART III: Associate model parameters and drug response

Which model parameters correlates with drug response IC50 ?

parameters_td <- optimised_parameters %>% gather(model_parameter, par_value, -cell_line)
drugs_td <- IC50 %>% gather(drug, IC50, -cell_line)

all_data <- full_join(parameters_td,drugs_td,by="cell_line")


# for each drug and each parameter compute the correlation coefficient

corr_data <- all_data %>% 
  group_by(drug, model_parameter) %>%
  nest() %>%
  summarise(corr_data = purrr::map(data,function(df){
    cor.test(formula = ~IC50+par_value,data=df) %>% broom::tidy()
  }))
  

corr_data <- corr_data %>%  unnest(corr_data)
corr_data
## # A tibble: 2,640 x 10
## # Groups:   model_parameter [88]
##    model_parameter drug  estimate statistic p.value parameter conf.low conf.high
##    <chr>           <chr>    <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl>
##  1 AKT_k_GSK3      (5Z)…   0.489      1.94  0.0761         12  -0.0563    0.809 
##  2 AKT_k_GSK3      Afat…  -0.619     -2.23  0.0562          8  -0.899     0.0169
##  3 AKT_k_GSK3      Afat…  -0.439     -1.69  0.117          12  -0.786     0.120 
##  4 AKT_k_GSK3      AZD6…  -0.264     -0.820 0.433           9  -0.746     0.399 
##  5 AKT_k_GSK3      AZD6…   0.691      3.31  0.00625        12   0.253     0.894 
##  6 AKT_k_GSK3      BMS-…  -0.0457    -0.152 0.882          11  -0.582     0.518 
##  7 AKT_k_GSK3      BMS-…   0.102      0.355 0.729          12  -0.453     0.600 
##  8 AKT_k_GSK3      BMS-…   0.572      2.09  0.0657          9  -0.0418    0.873 
##  9 AKT_k_GSK3      Cetu…  -0.372     -1.39  0.191          12  -0.754     0.198 
## 10 AKT_k_GSK3      CH54…   0.0909     0.303 0.768          11  -0.484     0.611 
## # … with 2,630 more rows, and 2 more variables: method <chr>, alternative <chr>

Let’s show some correlation

corr_data %>% arrange(desc(abs(estimate))) %>% print(.,n=25)
## # A tibble: 2,640 x 10
## # Groups:   model_parameter [88]
##    model_parameter drug  estimate statistic p.value parameter conf.low conf.high
##    <chr>           <chr>    <dbl>     <dbl>   <dbl>     <int>    <dbl>     <dbl>
##  1 tau_TBK1        Afat…   -0.896     -5.72 4.44e-4         8   -0.975    -0.612
##  2 tau_ERK         Afat…   -0.875     -5.12 9.10e-4         8   -0.970    -0.547
##  3 tau_RPS6        BMS-…    0.861      5.07 6.70e-4         9    0.539     0.963
##  4 tau_ERK         Gefi…   -0.859     -4.74 1.46e-3         8   -0.966    -0.499
##  5 tau_RPS6        OSI-…    0.846      4.76 1.03e-3         9    0.500     0.959
##  6 PAK1_k_MEK      Afat…    0.834      4.28 2.70e-3         8    0.431     0.960
##  7 IKK_k_ERK       RDEA…    0.818      4.02 3.86e-3         8    0.387     0.955
##  8 tau_TBK1        Gefi…   -0.813     -3.95 4.26e-3         8   -0.954    -0.375
##  9 tau_IkBa        SB59…   -0.799     -3.98 3.19e-3         9   -0.946    -0.382
## 10 tau_TBK1        Afat…   -0.784     -4.38 9.01e-4        12   -0.928    -0.434
## 11 PI3K_k_S6K      CI-1…    0.777      3.49 8.17e-3         8    0.289     0.945
## 12 M3K1_k_IKK      CI-1…   -0.776     -3.48 8.38e-3         8   -0.944    -0.285
## 13 tau_ERK         Afat…   -0.762     -4.08 1.53e-3        12   -0.921    -0.389
## 14 ERK_k_BRAF      HG-5…    0.756      3.26 1.15e-2         8    0.240     0.939
## 15 ERK_k_RPS6      HG-5…   -0.754     -3.25 1.18e-2         8   -0.938    -0.237
## 16 AKT_k_GSK3      PD-0…    0.732      3.23 1.04e-2         9    0.236     0.926
## 17 PI3K_k_S6K      RDEA…    0.731      3.03 1.62e-2         8    0.189     0.932
## 18 PI3K_k_S6K      PD-0…    0.721      3.13 1.22e-2         9    0.214     0.922
## 19 RSKp90_k_GSK3   SB59…   -0.717     -3.09 1.30e-2         9   -0.921    -0.206
## 20 PI3K_k_S6K      (5Z)…    0.717      3.56 3.89e-3        12    0.301     0.904
## 21 MP2K4_k_JNK     RDEA…    0.712      2.87 2.10e-2         8    0.149     0.926
## 22 ERK_k_BRAF      OSI-…    0.708      3.01 1.47e-2         9    0.188     0.918
## 23 tau_MEK         Gefi…    0.708      2.83 2.20e-2         8    0.141     0.925
## 24 MP2K4_k_JNK     (5Z)…    0.707      3.46 4.71e-3        12    0.282     0.900
## 25 tau_SMAD2       THZ-…    0.706      3.31 7.01e-3        11    0.254     0.905
## # … with 2,615 more rows, and 2 more variables: method <chr>, alternative <chr>
all_data %>% filter(drug=="Afatinib",model_parameter=="tau_TBK1") %>%
  ggplot(aes(par_value,IC50)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = cell_line)) +
  ggtitle("DRUG: Afatinib; parameter: tau_TBK1")
## Warning: Removed 4 rows containing missing values (geom_point).
## Warning: Removed 4 rows containing missing values (geom_label_repel).

Think what the problem might be here?
We have only 14 cell-lines, therefore each of the correlations between model parameter and drug IC50 is based on 14 data points. There are 31 drugs and 89 model parameters, which results in 31*89=2759 tests.

Also this is only a single parameter - single drug association. It is possible, that the existence of multiple edges makes a cell-line sensitive/resistant. Therefore (Eduati et al) derived linear models, that funds multiple parameters at the same time.

knitr::include_graphics("./data/logicODE_tutorial/Eduatietal_Figure5.png")